Exploratory Analysis of table actor_text_property¶

Table extract¶

pk_actor_text_property property_type lang_iso_code text notes fk_actor creator modifier creation_time modification_time concat_actp
32155 18639 notice fra Employé de l'octroi de Lyon (1880-1901) None 40848 24.0 24.0 2010-11-21 21:47:07.000 2013-12-18 15:24:16 AcTP18639
44005 26784 notice fra <p>Vicaire de la paroisse saint-Martin d'Ainay... 44257 39.0 1.0 2012-01-07 19:32:06.450 2021-12-26 15:19:11 AcTP26784
21705 6711 notice fra Diplômé des Arts et Métiers (Châlons, 1869), c... None 25843 11.0 11.0 2010-01-17 14:38:15.000 2013-12-18 15:24:16 AcTP6711
7181 58166 complément fra <p>Négociant.</p> 538 56923 51.0 51.0 2014-11-27 11:33:38.060 NaT AcTP58166
14261 63886 complément fra <p> Négociant en vins, vice président de Fédér... 403 56829 51.0 51.0 2014-12-08 16:07:16.510 NaT AcTP63886

Discovery¶

Columns contain:
Total number of rows: 53887
  - "pk_actor_text_property":   0.00% empty - 53887 (100.00%) uniques (eg: 29364; 29366; 17991)
  -          "property_type":   0.00% empty -     4 (  0.01%) uniques (eg: notice; notice_web; complément)
  -                   "text":   0.00% empty - 38518 ( 71.48%) uniques (eg: <p>Directe...; <p>Conseil...; <p>Il a ét...)
  -               "fk_actor":   0.00% empty - 45931 ( 85.24%) uniques (eg: 47735; 47736; 40250)
  -            "concat_actp":   0.00% empty - 53887 (100.00%) uniques (eg: AcTP29364; AcTP29366; AcTP17991)
  -          "creation_time":   0.00% empty - 30407 ( 56.43%) uniques (eg: 2013-12-19...; 2013-12-19...; 2010-11-18...)
  -                "creator":   0.01% empty -    87 (  0.16%) uniques (eg: 2.0; 50.0; 3.0)
  -          "lang_iso_code":   2.79% empty -     6 (  0.01%) uniques (eg: fra; None; ita)
  -               "modifier":  13.57% empty -    82 (  0.15%) uniques (eg: 2.0; 50.0; 3.0)
  -      "modification_time":  42.69% empty -  4417 (  8.20%) uniques (eg: NaT; 2013-12-19...; 2013-12-19...)
  -                  "notes":  60.97% empty -  8515 ( 15.80%) uniques (eg: ; None; 96)

Type parsing¶

According to the table before, we will parse each column by the most meaningful type.

Columns analysis¶

Here we will report the analysis of interesting information found on different columns. They are not exhaustive.

For some of the column, we will update their value.

property_type¶

'notice web' and 'notice_web' are being merged.

Moreover, according to the wiki page, 'notice_web' and 'notice' would then be merged.

text¶

All HTML tags, non ASCII chars and new line are removed.

creation_time¶

creator¶

lang_iso_code¶